Background: The island city-state on the southern tip of the Malay Peninsula has seen many rulers in its history. It has been an outpost of the ancient Sumatran Srivijaya empire; it was part of the Sultanate of Johore in the 15th and 16th centuries, and it was burned down by Portuguese forces in 1617. At the beginning of the 19th century, Singapore became a trade post and settlement of the British East India Company and a British crown colony some years later.
Singapore is today (beside Monaco) the most densely populated independent country in the world. Its strategic location at the Strait of Malacca and the South China Sea, combined with a mostly corruption-free government, a skilled workforce, pro-foreign investment and export-oriented, led to a thriving free-market economy that attracts international investment funds on a large scale despite its relatively high-cost operating environment.
# !python library_conso.py
# import library_conso # quick import
import numpy as np
import pandas as pd
pd.set_option('display.float_format', '{:.2f}'.format) # ? or change
import seaborn as sns
import matplotlib.pyplot as plt
import requests
import pydrive
import pandas_profiling
from pandas_profiling import ProfileReport
import glob # trying to automate it
import numpy as np
from bokeh.layouts import column
from bokeh.models import Div, TeX
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from sklearn.preprocessing import LabelEncoder
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'
from IPython.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
%matplotlib inline
output_notebook()
# !dir ..\data\all_hpi
# 03/11/2022 11:05 AM 12,287 metadata-resale-flat-prices.txt
# 03/11/2022 11:05 AM 22,922,820 resale-flat-prices-based-on-approval-date-1990-1999.csv
# 03/11/2022 11:05 AM 29,739,597 resale-flat-prices-based-on-approval-date-2000-feb-2012.csv
# 03/11/2022 11:05 AM 3,108,078 resale-flat-prices-based-on-registration-date-from-jan-2015-to-dec-2016.csv
# 03/11/2022 11:05 AM 11,978,170 resale-flat-prices-based-on-registration-date-from-jan-2017-onwards.csv
# 03/11/2022 11:05 AM 4,212,975 resale-flat-prices-based-on-registration-date-from-mar-2012-to-dec-2014.csv
#
# could use glob here, but i aldo do care about the timestamps...
# i also have this stored on S3 for direct access instead of local
df1 = pd.read_csv(r'D:\GITHUB_Repos\Singapore\ENTER\data\all_hpi\resale-flat-prices-based-on-approval-date-1990-1999.csv')
df2 = pd.read_csv(r'D:\GITHUB_Repos\Singapore\ENTER\data\all_hpi\resale-flat-prices-based-on-approval-date-2000-feb-2012.csv')
df3 = pd.read_csv(r'D:\GITHUB_Repos\Singapore\ENTER\data\all_hpi\resale-flat-prices-based-on-registration-date-from-mar-2012-to-dec-2014.csv')
df4 = pd.read_csv(r'D:\GITHUB_Repos\Singapore\ENTER\data\all_hpi\resale-flat-prices-based-on-registration-date-from-jan-2015-to-dec-2016.csv')
df5 = pd.read_csv(r'D:\GITHUB_Repos\Singapore\ENTER\data\all_hpi\resale-flat-prices-based-on-registration-date-from-jan-2017-onwards.csv')
df1.head(); df2.head(); df3.head(); df4.head(); df5.head() # yes, some data csv dont have remaining_lease col...
| month | town | flat_type | block | street_name | storey_range | floor_area_sqm | flat_model | lease_commence_date | resale_price | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1990-01 | ANG MO KIO | 1 ROOM | 309 | ANG MO KIO AVE 1 | 10 TO 12 | 31.00 | IMPROVED | 1977 | 9000 |
| 1 | 1990-01 | ANG MO KIO | 1 ROOM | 309 | ANG MO KIO AVE 1 | 04 TO 06 | 31.00 | IMPROVED | 1977 | 6000 |
| 2 | 1990-01 | ANG MO KIO | 1 ROOM | 309 | ANG MO KIO AVE 1 | 10 TO 12 | 31.00 | IMPROVED | 1977 | 8000 |
| 3 | 1990-01 | ANG MO KIO | 1 ROOM | 309 | ANG MO KIO AVE 1 | 07 TO 09 | 31.00 | IMPROVED | 1977 | 6000 |
| 4 | 1990-01 | ANG MO KIO | 3 ROOM | 216 | ANG MO KIO AVE 1 | 04 TO 06 | 73.00 | NEW GENERATION | 1976 | 47200 |
| month | town | flat_type | block | street_name | storey_range | floor_area_sqm | flat_model | lease_commence_date | resale_price | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2000-01 | ANG MO KIO | 3 ROOM | 170 | ANG MO KIO AVE 4 | 07 TO 09 | 69.00 | Improved | 1986 | 147000.00 |
| 1 | 2000-01 | ANG MO KIO | 3 ROOM | 174 | ANG MO KIO AVE 4 | 04 TO 06 | 61.00 | Improved | 1986 | 144000.00 |
| 2 | 2000-01 | ANG MO KIO | 3 ROOM | 216 | ANG MO KIO AVE 1 | 07 TO 09 | 73.00 | New Generation | 1976 | 159000.00 |
| 3 | 2000-01 | ANG MO KIO | 3 ROOM | 215 | ANG MO KIO AVE 1 | 07 TO 09 | 73.00 | New Generation | 1976 | 167000.00 |
| 4 | 2000-01 | ANG MO KIO | 3 ROOM | 218 | ANG MO KIO AVE 1 | 07 TO 09 | 67.00 | New Generation | 1976 | 163000.00 |
| month | town | flat_type | block | street_name | storey_range | floor_area_sqm | flat_model | lease_commence_date | resale_price | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2012-03 | ANG MO KIO | 2 ROOM | 172 | ANG MO KIO AVE 4 | 06 TO 10 | 45.00 | Improved | 1986 | 250000.00 |
| 1 | 2012-03 | ANG MO KIO | 2 ROOM | 510 | ANG MO KIO AVE 8 | 01 TO 05 | 44.00 | Improved | 1980 | 265000.00 |
| 2 | 2012-03 | ANG MO KIO | 3 ROOM | 610 | ANG MO KIO AVE 4 | 06 TO 10 | 68.00 | New Generation | 1980 | 315000.00 |
| 3 | 2012-03 | ANG MO KIO | 3 ROOM | 474 | ANG MO KIO AVE 10 | 01 TO 05 | 67.00 | New Generation | 1984 | 320000.00 |
| 4 | 2012-03 | ANG MO KIO | 3 ROOM | 604 | ANG MO KIO AVE 5 | 06 TO 10 | 67.00 | New Generation | 1980 | 321000.00 |
| month | town | flat_type | block | street_name | storey_range | floor_area_sqm | flat_model | lease_commence_date | remaining_lease | resale_price | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2015-01 | ANG MO KIO | 3 ROOM | 174 | ANG MO KIO AVE 4 | 07 TO 09 | 60.00 | Improved | 1986 | 70 | 255000.00 |
| 1 | 2015-01 | ANG MO KIO | 3 ROOM | 541 | ANG MO KIO AVE 10 | 01 TO 03 | 68.00 | New Generation | 1981 | 65 | 275000.00 |
| 2 | 2015-01 | ANG MO KIO | 3 ROOM | 163 | ANG MO KIO AVE 4 | 01 TO 03 | 69.00 | New Generation | 1980 | 64 | 285000.00 |
| 3 | 2015-01 | ANG MO KIO | 3 ROOM | 446 | ANG MO KIO AVE 10 | 01 TO 03 | 68.00 | New Generation | 1979 | 63 | 290000.00 |
| 4 | 2015-01 | ANG MO KIO | 3 ROOM | 557 | ANG MO KIO AVE 10 | 07 TO 09 | 68.00 | New Generation | 1980 | 64 | 290000.00 |
| month | town | flat_type | block | street_name | storey_range | floor_area_sqm | flat_model | lease_commence_date | remaining_lease | resale_price | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2017-01 | ANG MO KIO | 2 ROOM | 406 | ANG MO KIO AVE 10 | 10 TO 12 | 44.00 | Improved | 1979 | 61 years 04 months | 232000.00 |
| 1 | 2017-01 | ANG MO KIO | 3 ROOM | 108 | ANG MO KIO AVE 4 | 01 TO 03 | 67.00 | New Generation | 1978 | 60 years 07 months | 250000.00 |
| 2 | 2017-01 | ANG MO KIO | 3 ROOM | 602 | ANG MO KIO AVE 5 | 01 TO 03 | 67.00 | New Generation | 1980 | 62 years 05 months | 262000.00 |
| 3 | 2017-01 | ANG MO KIO | 3 ROOM | 465 | ANG MO KIO AVE 10 | 04 TO 06 | 68.00 | New Generation | 1980 | 62 years 01 month | 265000.00 |
| 4 | 2017-01 | ANG MO KIO | 3 ROOM | 601 | ANG MO KIO AVE 5 | 01 TO 03 | 67.00 | New Generation | 1980 | 62 years 05 months | 265000.00 |
df = df1.append(df2).append(df3).append(df4).append(df5).reset_index(drop=True) # .sort_values('whatever')
df.tail()
| month | town | flat_type | block | street_name | storey_range | floor_area_sqm | flat_model | lease_commence_date | resale_price | remaining_lease | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 867886 | 2022-03 | YISHUN | 5 ROOM | 501B | YISHUN ST 51 | 07 TO 09 | 112.00 | Improved | 2018 | 652000.00 | 94 years 11 months |
| 867887 | 2022-03 | YISHUN | 5 ROOM | 820 | YISHUN ST 81 | 07 TO 09 | 122.00 | Improved | 1988 | 585000.00 | 65 years 06 months |
| 867888 | 2022-03 | YISHUN | EXECUTIVE | 359 | YISHUN RING RD | 01 TO 03 | 145.00 | Apartment | 1988 | 682000.00 | 65 years 05 months |
| 867889 | 2022-03 | YISHUN | EXECUTIVE | 387 | YISHUN RING RD | 04 TO 06 | 146.00 | Maisonette | 1988 | 720000.00 | 65 years 04 months |
| 867890 | 2022-03 | YISHUN | EXECUTIVE | 277 | YISHUN ST 22 | 10 TO 12 | 146.00 | Maisonette | 1985 | 788000.00 | 62 years 06 months |
assert len(df) == len(df1) + len(df2) + len(df3) + len(df4) + len(df5), "Warning: During merge, you lose some data rows..."
assert len(df) == 867891
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 867891 entries, 0 to 867890 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 month 867891 non-null object 1 town 867891 non-null object 2 flat_type 867891 non-null object 3 block 867891 non-null object 4 street_name 867891 non-null object 5 storey_range 867891 non-null object 6 floor_area_sqm 867891 non-null float64 7 flat_model 867891 non-null object 8 lease_commence_date 867891 non-null int64 9 resale_price 867891 non-null float64 10 remaining_lease 158841 non-null object dtypes: float64(2), int64(1), object(8) memory usage: 72.8+ MB
df.dtypes
month object town object flat_type object block object street_name object storey_range object floor_area_sqm float64 flat_model object lease_commence_date int64 resale_price float64 remaining_lease object dtype: object
df.resale_price.describe()
count 867891.00 mean 303752.12 std 156280.58 min 5000.00 25% 186000.00 50% 283000.00 75% 395000.00 max 1360000.00 Name: resale_price, dtype: float64
for c in df.columns: print(c) # there are all the cols I have to work with initially
month town flat_type block street_name storey_range floor_area_sqm flat_model lease_commence_date resale_price remaining_lease
df.town.unique().tolist() # len(df.town.unique()) - > 27 unique towns
['ANG MO KIO', 'BEDOK', 'BISHAN', 'BUKIT BATOK', 'BUKIT MERAH', 'BUKIT TIMAH', 'CENTRAL AREA', 'CHOA CHU KANG', 'CLEMENTI', 'GEYLANG', 'HOUGANG', 'JURONG EAST', 'JURONG WEST', 'KALLANG/WHAMPOA', 'MARINE PARADE', 'QUEENSTOWN', 'SENGKANG', 'SERANGOON', 'TAMPINES', 'TOA PAYOH', 'WOODLANDS', 'YISHUN', 'LIM CHU KANG', 'SEMBAWANG', 'BUKIT PANJANG', 'PASIR RIS', 'PUNGGOL']
df['town'] = df['town'].str.title() # capitalize the first letter of each word in town
df.town.unique().tolist() # len(df.town.unique()) - > 27 unique towns
['Ang Mo Kio', 'Bedok', 'Bishan', 'Bukit Batok', 'Bukit Merah', 'Bukit Timah', 'Central Area', 'Choa Chu Kang', 'Clementi', 'Geylang', 'Hougang', 'Jurong East', 'Jurong West', 'Kallang/Whampoa', 'Marine Parade', 'Queenstown', 'Sengkang', 'Serangoon', 'Tampines', 'Toa Payoh', 'Woodlands', 'Yishun', 'Lim Chu Kang', 'Sembawang', 'Bukit Panjang', 'Pasir Ris', 'Punggol']
# read list of towns from wikipedia for sanity
# interesting: https://www.hdb.gov.sg/cs/infoweb/about-us/history
# As of 2021, there in 23 towns and 3 estates across Singapore. These statistics reflect the boundaries
# of HDB towns and are not necessarily the same as planning area statistics.
towns_singapore_scraped_wikipedia = pd.read_html('https://en.wikipedia.org/wiki/New_towns_of_Singapore')
towns_singapore_scraped_wikipedia[0]
| Name (English/Malay) | Chinese | Pinyin | Tamil | Total area (km2) | Residential area (km2) | Dwelling units | Projected ultimate | Population | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Ang Mo Kio | 宏茂桥 | hóngmàoqiáo | ஆங் மோ கியோ | 6.38 | 2.83 | 50726 | 58000 | 138200 |
| 1 | Bedok | 勿洛 | wùluò | பிடோ | 9.37 | 4.18 | 62816 | 79000 | 187900 |
| 2 | Bishan | 碧山 | bìshān | பீஷான் | 6.90 | 1.72 | 20072 | 34000 | 61100 |
| 3 | Bukit Batok | 武吉巴督 | wǔjíbādū | புக்கிட் பாத்தோக் | 7.85 | 2.91 | 44285 | 54000 | 121400 |
| 4 | Bukit Merah | 红山 | hóngshān | புக்கிட் மேரா | 8.58 | 3.12 | 54227 | 68000 | 141400 |
| 5 | Bukit Panjang | 武吉班让 | wǔjíbānràng | புக்கிட் பாஞ்சாங் | 4.89 | 2.19 | 35325 | 44000 | 118900 |
| 6 | Choa Chu Kang | 蔡厝港 | càicuògǎng | சுவா சூ காங் | 5.83 | 3.07 | 48900 | 62000 | 167200 |
| 7 | Clementi | 金文泰 | jīnwéntài | கிளிமெண்டி | 4.12 | 2.03 | 26730 | 39000 | 69500 |
| 8 | Geylang | 芽笼 | yálóng | கேலாங் | 6.78 | 2.14 | 30892 | 50000 | 86000 |
| 9 | Hougang | 后港 | hòugǎng | ஹவ்காங் | 13.09 | 3.67 | 57272 | 72000 | 179700 |
| 10 | Jurong East | 裕廊东 | yùlángdōng | ஜூரோங் | 3.84 | 1.65 | 24122 | 31000 | 75400 |
| 11 | Jurong West | 裕廊西 | yùlángxī | ஜூரோங் | 9.87 | 4.80 | 75208 | 94000 | 253800 |
| 12 | Kallang/Whampoa | 加冷/黄浦 | jiālĕng/huángpǔ | காலாங் | 7.99 | 2.10 | 39931 | 57000 | 105200 |
| 13 | Pasir Ris | 巴西立 | bāxīlì | பாசிர் ரிஸ் | 6.01 | 3.18 | 29654 | 44000 | 106600 |
| 14 | Punggol | 榜鵝 | bǎng'é | பொங்கோல் | 9.57 | 3.74 | 50663 | 96000 | 146900 |
| 15 | Queenstown | 女皇镇 | nǚhuángzhèn | குவீன்ஸ்டவுன் | 6.94 | 2.10 | 33164 | 60000 | 81200 |
| 16 | Sembawang | 三巴旺 | sānbāwàng | செம்பவாங் | 7.08 | 3.31 | 30020 | 65000 | 81500 |
| 17 | Sengkang | 盛港 | shènggǎng | செங்காங | 10.55 | 3.97 | 69196 | 96000 | 217700 |
| 18 | Serangoon | 实龙岗 | shílónggāng | சிராங்கூன் | 7.37 | 1.63 | 21632 | 30000 | 66800 |
| 19 | Tampines | 淡滨尼 | dànbīnní | தெம்பினிஸ் | 12.00 | 5.49 | 72683 | 110000 | 232700 |
| 20 | Toa Payoh | 大巴窑 | dàbāyáo | தோ பாயோ | 5.56 | 2.48 | 39737 | 61000 | 103800 |
| 21 | Woodlands | 兀兰 | wùlán | ஊட்லண்ட்ஸ் | 11.98 | 4.80 | 69900 | 102000 | 243300 |
| 22 | Yishun | 义顺 | yìshùn | யீஷூன் | 7.78 | 3.98 | 65158 | 84000 | 198500 |
towns_singapore_scraped_wikipedia[1] # estates ?
| Name (English/Malay) | Chinese | Pinyin | Tamil | Dwelling units | Population | |
|---|---|---|---|---|---|---|
| 0 | Bukit Timah | 武吉知马 | wūjízhīmǎ | புக்கித் திமா | 2423 | 8100 |
| 1 | Marine Parade | 马林百列 | mǎlínbǎiliè | மரின் பரேட் | 6537 | 20800 |
| 2 | Central Area | 新加坡中區 | xīnjiāpōzhōngqū | சிங்கப்பூர் மாவட்டம் | 9459 | 27200 |
df.flat_type.unique().tolist()
# Multi and Multi the same ?
['1 ROOM', '3 ROOM', '4 ROOM', '5 ROOM', '2 ROOM', 'EXECUTIVE', 'MULTI GENERATION', 'MULTI-GENERATION']
df.flat_type.value_counts()
4 ROOM 326735 3 ROOM 281589 5 ROOM 181442 EXECUTIVE 65888 2 ROOM 10423 1 ROOM 1285 MULTI GENERATION 279 MULTI-GENERATION 250 Name: flat_type, dtype: int64
len(df.block.unique()) # there are many unique blocks, mainly numbers, but do include LETTERS as well
# thus it is object/string
2578
len(df.street_name.unique().tolist()) # 577 unique street names...
577
df.storey_range.unique().tolist()
['10 TO 12', '04 TO 06', '07 TO 09', '01 TO 03', '13 TO 15', '19 TO 21', '16 TO 18', '25 TO 27', '22 TO 24', '28 TO 30', '31 TO 33', '40 TO 42', '37 TO 39', '34 TO 36', '06 TO 10', '01 TO 05', '11 TO 15', '16 TO 20', '21 TO 25', '26 TO 30', '36 TO 40', '31 TO 35', '46 TO 48', '43 TO 45', '49 TO 51']
df.storey_range.value_counts()
04 TO 06 219125 07 TO 09 197571 01 TO 03 176158 10 TO 12 167760 13 TO 15 56354 16 TO 18 21500 19 TO 21 10247 22 TO 24 6663 25 TO 27 2911 01 TO 05 2700 06 TO 10 2474 28 TO 30 1278 11 TO 15 1259 31 TO 33 427 34 TO 36 392 37 TO 39 380 16 TO 20 265 40 TO 42 192 21 TO 25 92 43 TO 45 44 26 TO 30 39 46 TO 48 38 49 TO 51 13 36 TO 40 7 31 TO 35 2 Name: storey_range, dtype: int64
# this is tricky. you can label encode this or whatever, but maybe worth creating
# a couple new entries in df. maybe three features:
# the first value, the last value, and the average value...
# i.e. 04 - 06 would have new col for 4, 6, and 5...
df['storey_first'] = df['storey_range'].str.split().str[0]
df['storey_last'] = df['storey_range'].str.split().str[2]
df['storey_first'] = pd.to_numeric(df['storey_first'])
df['storey_last'] = pd.to_numeric(df['storey_last'])
df['storey_avg'] = (df['storey_first'] + df['storey_last'])/2
df.columns
Index(['month', 'town', 'flat_type', 'block', 'street_name', 'storey_range',
'floor_area_sqm', 'flat_model', 'lease_commence_date', 'resale_price',
'remaining_lease', 'storey_first', 'storey_last', 'storey_avg'],
dtype='object')
df.flat_model.unique().tolist()
# we must assume these have many duplicated...
# i.e. IMPROVED actually equals Improved
['IMPROVED', 'NEW GENERATION', 'MODEL A', 'STANDARD', 'SIMPLIFIED', 'MODEL A-MAISONETTE', 'APARTMENT', 'MAISONETTE', 'TERRACE', '2-ROOM', 'IMPROVED-MAISONETTE', 'MULTI GENERATION', 'PREMIUM APARTMENT', 'Improved', 'New Generation', 'Model A', 'Standard', 'Apartment', 'Simplified', 'Model A-Maisonette', 'Maisonette', 'Multi Generation', 'Adjoined flat', 'Premium Apartment', 'Terrace', 'Improved-Maisonette', 'Premium Maisonette', '2-room', 'Model A2', 'DBSS', 'Type S1', 'Type S2', 'Premium Apartment Loft']
df['flat_model'] = df['flat_model'].str.lower()
df.flat_model.unique().tolist()
['improved', 'new generation', 'model a', 'standard', 'simplified', 'model a-maisonette', 'apartment', 'maisonette', 'terrace', '2-room', 'improved-maisonette', 'multi generation', 'premium apartment', 'adjoined flat', 'premium maisonette', 'model a2', 'dbss', 'type s1', 'type s2', 'premium apartment loft']
df.flat_model.value_counts()
model a 242173 improved 227656 new generation 182208 simplified 55353 standard 41047 premium apartment 40355 apartment 33607 maisonette 28192 model a2 9490 dbss 2596 model a-maisonette 1974 adjoined flat 1155 terrace 670 multi generation 529 type s1 360 type s2 186 improved-maisonette 118 premium maisonette 85 premium apartment loft 84 2-room 53 Name: flat_model, dtype: int64
# wtf
df.columns
Index(['month', 'town', 'flat_type', 'block', 'street_name', 'storey_range',
'floor_area_sqm', 'flat_model', 'lease_commence_date', 'resale_price',
'remaining_lease', 'storey_first', 'storey_last', 'storey_avg'],
dtype='object')
df.floor_area_sqm.describe()
count 867891.00 mean 95.68 std 25.96 min 28.00 25% 73.00 50% 93.00 75% 113.00 max 307.00 Name: floor_area_sqm, dtype: float64
# create new variable of sqft for us unsophisticated Americans
df['floor_area_sqft'] = df['floor_area_sqm'] * 10.7639
df.floor_area_sqft.describe()
count 867891.00 mean 1029.93 std 279.42 min 301.39 25% 785.76 50% 1001.04 75% 1216.32 max 3304.52 Name: floor_area_sqft, dtype: float64
df.columns
Index(['month', 'town', 'flat_type', 'block', 'street_name', 'storey_range',
'floor_area_sqm', 'flat_model', 'lease_commence_date', 'resale_price',
'remaining_lease', 'storey_first', 'storey_last', 'storey_avg',
'floor_area_sqft'],
dtype='object')
df.resale_price.describe()
count 867891.00 mean 303752.12 std 156280.58 min 5000.00 25% 186000.00 50% 283000.00 75% 395000.00 max 1360000.00 Name: resale_price, dtype: float64
df.flat_type.unique() # maybe convert the E and the Multi-G to like 6 ROOM and 7 ROOM ?
# figure out way to quantify and rank, you can just label encode this.
# make sure they are in numeric significant 'order' ranked, etc crap
array(['1 ROOM', '3 ROOM', '4 ROOM', '5 ROOM', '2 ROOM', 'EXECUTIVE',
'MULTI GENERATION', 'MULTI-GENERATION'], dtype=object)
df.lease_commence_date.unique()
array([1977, 1976, 1978, 1979, 1984, 1980, 1985, 1981, 1982, 1986, 1972,
1983, 1973, 1969, 1975, 1971, 1974, 1967, 1970, 1968, 1988, 1987,
1989, 1990, 1992, 1993, 1994, 1991, 1995, 1996, 1997, 1998, 1999,
2000, 2001, 1966, 2002, 2006, 2003, 2005, 2004, 2008, 2007, 2009,
2010, 2012, 2011, 2013, 2014, 2015, 2016, 2017, 2018, 2019],
dtype=int64)
df.lease_commence_date.value_counts()
1985 84260 1984 61034 1988 48699 1987 41179 1978 39833 1986 37131 1989 30943 1980 30918 1979 29820 1997 29368 1993 26992 1996 26782 1998 24187 1983 22235 1976 20674 1981 20110 1977 19976 1999 19399 1992 19338 1995 18137 2001 17845 2000 17007 1975 16704 2003 14287 1974 14022 1982 12800 2002 11468 1970 10981 1990 8915 1973 8302 1969 8091 2004 7706 1971 7626 2015 7623 1994 6229 1967 5939 1972 5586 2016 4445 2013 4322 2012 4013 2005 3617 2014 2829 2006 2815 2017 2650 2011 2238 2008 1862 1968 1822 2009 1559 1991 1192 2010 1159 2007 836 2018 322 2019 34 1966 30 Name: lease_commence_date, dtype: int64
# df.remaining_lease.unique()
# snippet
# array([nan, 70.0, 65.0, 64.0, 63.0, 62.0, 69.0, 60.0, 61.0, 86.0, 77.0,
# 80.0, 90.0, 87.0, 66.0, 58.0, 94.0, 71.0, 68.0, 84.0, 73.0, 79.0,
# 76.0, 72.0, 82.0, 74.0, 67.0, 88.0, 81.0, 89.0, 53.0, 54.0, 55.0,
# 57.0, 93.0, 83.0, 85.0, 92.0, 91.0, 59.0, 95.0, 52.0, 51.0, 56.0,
# 75.0, 96.0, 78.0, 50.0, 97.0, 49.0, 48.0, '61 years 04 months',
# '60 years 07 months', '62 years 05 months', '62 years 01 month',
# '63 years', '61 years 06 months', '58 years 04 months',
# '59 years 08 months', '59 years 06 months', '60 years',
# '62 years 08 months', '61 years', '60 years 10 months',
# '59 years 03 months', '61 years 05 months', '60 years 04 months',
# '62 years', '60 years 03 months', '63 years 09 months',
# '61 years 01 month', '61 years 10 months', '58 years 06 months',
# '59 years 04 months', '62 years 11 months', '60 years 08 months',
# '93 years 08 months', '93 years 07 months', '60 years 01 month',
df.remaining_lease.value_counts
<bound method IndexOpsMixin.value_counts of 0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
...
867886 94 years 11 months
867887 65 years 06 months
867888 65 years 05 months
867889 65 years 04 months
867890 62 years 06 months
Name: remaining_lease, Length: 867891, dtype: object>
# one approach:
# we really most likely care about the number of years..
# break out the values like: 94 years 11 months and push
# to new col years and new col months (or call it a decimal of years)
# 94 years 11 months
df[df['remaining_lease'].str.contains("years", na=False)].head()
| month | town | flat_type | block | street_name | storey_range | floor_area_sqm | flat_model | lease_commence_date | resale_price | remaining_lease | storey_first | storey_last | storey_avg | floor_area_sqft | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 746203 | 2017-01 | Ang Mo Kio | 2 ROOM | 406 | ANG MO KIO AVE 10 | 10 TO 12 | 44.00 | improved | 1979 | 232000.00 | 61 years 04 months | 10 | 12 | 11.00 | 473.61 |
| 746204 | 2017-01 | Ang Mo Kio | 3 ROOM | 108 | ANG MO KIO AVE 4 | 01 TO 03 | 67.00 | new generation | 1978 | 250000.00 | 60 years 07 months | 1 | 3 | 2.00 | 721.18 |
| 746205 | 2017-01 | Ang Mo Kio | 3 ROOM | 602 | ANG MO KIO AVE 5 | 01 TO 03 | 67.00 | new generation | 1980 | 262000.00 | 62 years 05 months | 1 | 3 | 2.00 | 721.18 |
| 746206 | 2017-01 | Ang Mo Kio | 3 ROOM | 465 | ANG MO KIO AVE 10 | 04 TO 06 | 68.00 | new generation | 1980 | 265000.00 | 62 years 01 month | 4 | 6 | 5.00 | 731.95 |
| 746207 | 2017-01 | Ang Mo Kio | 3 ROOM | 601 | ANG MO KIO AVE 5 | 01 TO 03 | 67.00 | new generation | 1980 | 265000.00 | 62 years 05 months | 1 | 3 | 2.00 | 721.18 |
# df[df['remaining_lease'].str.contains("years", na=False)]['remaining_lease']
# not sure what makes sense to do here...
profile = ProfileReport(df, title="Pandas Profiling Report")
profile.to_notebook_iframe()
Summarize dataset: 0%| | 0/5 [00:00<?, ?it/s]
Generate report structure: 0%| | 0/1 [00:00<?, ?it/s]
Render HTML: 0%| | 0/1 [00:00<?, ?it/s]
# 11004
labelencoder = LabelEncoder()
df['flat_type_label'] = labelencoder.fit_transform(df['flat_type'])
df
| month | town | flat_type | block | street_name | storey_range | floor_area_sqm | flat_model | lease_commence_date | resale_price | remaining_lease | storey_first | storey_last | storey_avg | floor_area_sqft | flat_type_label | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1990-01 | Ang Mo Kio | 1 ROOM | 309 | ANG MO KIO AVE 1 | 10 TO 12 | 31.00 | improved | 1977 | 9000.00 | NaN | 10 | 12 | 11.00 | 333.68 | 0 |
| 1 | 1990-01 | Ang Mo Kio | 1 ROOM | 309 | ANG MO KIO AVE 1 | 04 TO 06 | 31.00 | improved | 1977 | 6000.00 | NaN | 4 | 6 | 5.00 | 333.68 | 0 |
| 2 | 1990-01 | Ang Mo Kio | 1 ROOM | 309 | ANG MO KIO AVE 1 | 10 TO 12 | 31.00 | improved | 1977 | 8000.00 | NaN | 10 | 12 | 11.00 | 333.68 | 0 |
| 3 | 1990-01 | Ang Mo Kio | 1 ROOM | 309 | ANG MO KIO AVE 1 | 07 TO 09 | 31.00 | improved | 1977 | 6000.00 | NaN | 7 | 9 | 8.00 | 333.68 | 0 |
| 4 | 1990-01 | Ang Mo Kio | 3 ROOM | 216 | ANG MO KIO AVE 1 | 04 TO 06 | 73.00 | new generation | 1976 | 47200.00 | NaN | 4 | 6 | 5.00 | 785.76 | 2 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 867886 | 2022-03 | Yishun | 5 ROOM | 501B | YISHUN ST 51 | 07 TO 09 | 112.00 | improved | 2018 | 652000.00 | 94 years 11 months | 7 | 9 | 8.00 | 1205.56 | 4 |
| 867887 | 2022-03 | Yishun | 5 ROOM | 820 | YISHUN ST 81 | 07 TO 09 | 122.00 | improved | 1988 | 585000.00 | 65 years 06 months | 7 | 9 | 8.00 | 1313.20 | 4 |
| 867888 | 2022-03 | Yishun | EXECUTIVE | 359 | YISHUN RING RD | 01 TO 03 | 145.00 | apartment | 1988 | 682000.00 | 65 years 05 months | 1 | 3 | 2.00 | 1560.77 | 5 |
| 867889 | 2022-03 | Yishun | EXECUTIVE | 387 | YISHUN RING RD | 04 TO 06 | 146.00 | maisonette | 1988 | 720000.00 | 65 years 04 months | 4 | 6 | 5.00 | 1571.53 | 5 |
| 867890 | 2022-03 | Yishun | EXECUTIVE | 277 | YISHUN ST 22 | 10 TO 12 | 146.00 | maisonette | 1985 | 788000.00 | 62 years 06 months | 10 | 12 | 11.00 | 1571.53 | 5 |
867891 rows × 16 columns
df.drop('flat_type', axis=1, inplace=True)
df.street_name.unique() # large large number of them
array(['ANG MO KIO AVE 1', 'ANG MO KIO AVE 3', 'ANG MO KIO AVE 4',
'ANG MO KIO AVE 10', 'ANG MO KIO AVE 5', 'ANG MO KIO AVE 8',
'ANG MO KIO AVE 6', 'ANG MO KIO AVE 9', 'ANG MO KIO AVE 2',
'BEDOK RESERVOIR RD', 'BEDOK NTH ST 3', 'BEDOK STH RD',
'NEW UPP CHANGI RD', 'BEDOK NTH RD', 'BEDOK STH AVE 1',
'CHAI CHEE RD', 'CHAI CHEE DR', 'BEDOK NTH AVE 4',
'BEDOK STH AVE 3', 'BEDOK STH AVE 2', 'BEDOK NTH ST 2',
'BEDOK NTH ST 4', 'BEDOK NTH AVE 2', 'BEDOK NTH AVE 3',
'BEDOK NTH AVE 1', 'BEDOK NTH ST 1', 'CHAI CHEE ST', 'SIN MING RD',
'SHUNFU RD', 'BT BATOK ST 11', 'BT BATOK WEST AVE 8',
'BT BATOK WEST AVE 6', 'BT BATOK ST 21', 'BT BATOK EAST AVE 5',
'BT BATOK EAST AVE 4', 'HILLVIEW AVE', 'BT BATOK CTRL',
'BT BATOK ST 31', 'BT BATOK EAST AVE 3', 'TAMAN HO SWEE',
'TELOK BLANGAH CRES', 'BEO CRES', 'TELOK BLANGAH DR', 'DEPOT RD',
'TELOK BLANGAH RISE', 'JLN BT MERAH', 'HENDERSON RD', 'INDUS RD',
'BT MERAH VIEW', 'HENDERSON CRES', 'BT PURMEI RD',
'TELOK BLANGAH HTS', 'EVERTON PK', 'KG BAHRU HILL', 'REDHILL CL',
'HOY FATT RD', 'HAVELOCK RD', 'JLN KLINIK', 'JLN RUMAH TINGGI',
'JLN BT HO SWEE', 'KIM CHENG ST', 'MOH GUAN TER',
'TELOK BLANGAH WAY', 'KIM TIAN RD', 'KIM TIAN PL', 'EMPRESS RD',
"QUEEN'S RD", 'FARRER RD', 'JLN KUKOH', 'OUTRAM PK', 'SHORT ST',
'SELEGIE RD', 'UPP CROSS ST', 'WATERLOO ST', 'QUEEN ST',
'BUFFALO RD', 'ROWELL RD', 'ROCHOR RD', 'BAIN ST', 'SMITH ST',
'VEERASAMY RD', 'TECK WHYE AVE', 'TECK WHYE LANE',
'CLEMENTI AVE 3', 'WEST COAST DR', 'CLEMENTI AVE 2',
'CLEMENTI AVE 5', 'CLEMENTI AVE 4', 'CLEMENTI AVE 1',
'WEST COAST RD', 'CLEMENTI WEST ST 1', 'CLEMENTI WEST ST 2',
'CLEMENTI ST 13', "C'WEALTH AVE WEST", 'CLEMENTI AVE 6',
'CLEMENTI ST 14', 'CIRCUIT RD', 'MACPHERSON LANE',
'JLN PASAR BARU', 'GEYLANG SERAI', 'EUNOS CRES', 'SIMS DR',
'ALJUNIED CRES', 'GEYLANG EAST AVE 1', 'DAKOTA CRES', 'PINE CL',
'HAIG RD', 'BALAM RD', 'JLN DUA', 'GEYLANG EAST CTRL',
'EUNOS RD 5', 'HOUGANG AVE 3', 'HOUGANG AVE 5', 'HOUGANG AVE 1',
'HOUGANG ST 22', 'HOUGANG AVE 10', 'LOR AH SOO', 'HOUGANG ST 11',
'HOUGANG AVE 7', 'HOUGANG ST 21', 'TEBAN GDNS RD',
'JURONG EAST AVE 1', 'JURONG EAST ST 32', 'JURONG EAST ST 13',
'JURONG EAST ST 21', 'JURONG EAST ST 24', 'JURONG EAST ST 31',
'PANDAN GDNS', 'YUNG KUANG RD', 'HO CHING RD', 'HU CHING RD',
'BOON LAY DR', 'BOON LAY AVE', 'BOON LAY PL', 'JURONG WEST ST 52',
'JURONG WEST ST 41', 'JURONG WEST AVE 1', 'JURONG WEST ST 42',
'JLN BATU', "ST. GEORGE'S RD", 'NTH BRIDGE RD', 'FRENCH RD',
'BEACH RD', 'WHAMPOA DR', 'UPP BOON KENG RD', 'BENDEMEER RD',
'WHAMPOA WEST', 'LOR LIMAU', 'KALLANG BAHRU', 'GEYLANG BAHRU',
'DORSET RD', 'OWEN RD', 'KG ARANG RD', 'JLN BAHAGIA',
'MOULMEIN RD', 'TOWNER RD', 'JLN RAJAH', 'KENT RD', 'AH HOOD RD',
"KING GEORGE'S AVE", 'CRAWFORD LANE', 'MARINE CRES', 'MARINE DR',
'MARINE TER', "C'WEALTH CL", "C'WEALTH DR", 'TANGLIN HALT RD',
"C'WEALTH CRES", 'DOVER RD', 'MARGARET DR', 'GHIM MOH RD',
'DOVER CRES', 'STIRLING RD', 'MEI LING ST', 'HOLLAND CL',
'HOLLAND AVE', 'HOLLAND DR', 'DOVER CL EAST',
'SELETAR WEST FARMWAY 6', 'LOR LEW LIAN', 'SERANGOON NTH AVE 1',
'SERANGOON AVE 2', 'SERANGOON AVE 4', 'SERANGOON CTRL',
'TAMPINES ST 11', 'TAMPINES ST 21', 'TAMPINES ST 91',
'TAMPINES ST 81', 'TAMPINES AVE 4', 'TAMPINES ST 22',
'TAMPINES ST 12', 'TAMPINES ST 23', 'TAMPINES ST 24',
'TAMPINES ST 41', 'TAMPINES ST 82', 'TAMPINES ST 83',
'TAMPINES AVE 5', 'LOR 2 TOA PAYOH', 'LOR 8 TOA PAYOH',
'LOR 1 TOA PAYOH', 'LOR 5 TOA PAYOH', 'LOR 3 TOA PAYOH',
'LOR 7 TOA PAYOH', 'TOA PAYOH EAST', 'LOR 4 TOA PAYOH',
'TOA PAYOH CTRL', 'TOA PAYOH NTH', 'POTONG PASIR AVE 3',
'POTONG PASIR AVE 1', 'UPP ALJUNIED LANE', 'JOO SENG RD',
'MARSILING LANE', 'MARSILING DR', 'MARSILING RISE',
'MARSILING CRES', 'WOODLANDS CTR RD', 'WOODLANDS ST 13',
'WOODLANDS ST 11', 'YISHUN RING RD', 'YISHUN AVE 5',
'YISHUN ST 72', 'YISHUN ST 11', 'YISHUN ST 21', 'YISHUN ST 22',
'YISHUN AVE 3', 'CHAI CHEE AVE', 'ZION RD', 'LENGKOK BAHRU',
'SPOTTISWOODE PK RD', 'NEW MKT RD', 'TG PAGAR PLAZA',
'KELANTAN RD', 'PAYA LEBAR WAY', 'UBI AVE 1', 'SIMS AVE',
'YUNG PING RD', 'TAO CHING RD', 'GLOUCESTER RD', 'BOON KENG RD',
'WHAMPOA STH', 'CAMBRIDGE RD', 'TAMPINES ST 42', 'LOR 6 TOA PAYOH',
'KIM KEAT AVE', 'YISHUN AVE 6', 'YISHUN AVE 9', 'YISHUN ST 71',
'BT BATOK ST 32', 'SILAT AVE', 'TIONG BAHRU RD', 'SAGO LANE',
"ST. GEORGE'S LANE", 'LIM CHU KANG RD', "C'WEALTH AVE",
"QUEEN'S CL", 'SERANGOON AVE 3', 'POTONG PASIR AVE 2',
'WOODLANDS AVE 1', 'YISHUN AVE 4', 'LOWER DELTA RD', 'NILE RD',
'JLN MEMBINA BARAT', 'JLN BERSEH', 'CHANDER RD', 'CASSIA CRES',
'OLD AIRPORT RD', 'ALJUNIED RD', 'BUANGKOK STH FARMWAY 1',
'BT BATOK ST 33', 'ALEXANDRA RD', 'CHIN SWEE RD', 'SIMS PL',
'HOUGANG AVE 2', 'HOUGANG AVE 8', 'SEMBAWANG RD', 'SIMEI ST 1',
'BT BATOK ST 34', 'BT MERAH CTRL', 'LIM LIAK ST', 'JLN TENTERAM',
'WOODLANDS ST 32', 'SIN MING AVE', 'BT BATOK ST 52', 'DELTA AVE',
'PIPIT RD', 'HOUGANG AVE 4', 'QUEENSWAY', 'YISHUN ST 61',
'BISHAN ST 12', "JLN MA'MOR", 'TAMPINES ST 44', 'TAMPINES ST 43',
'BISHAN ST 13', 'JLN DUSUN', 'YISHUN AVE 2', 'JOO CHIAT RD',
'EAST COAST RD', 'REDHILL RD', 'KIM PONG RD', 'RACE COURSE RD',
'KRETA AYER RD', 'HOUGANG ST 61', 'TESSENSOHN RD', 'MARSILING RD',
'YISHUN ST 81', 'BT BATOK ST 51', 'BT BATOK WEST AVE 4',
'BT BATOK WEST AVE 2', 'JURONG WEST ST 91', 'JURONG WEST ST 81',
'GANGSA RD', 'MCNAIR RD', 'SIMEI ST 4', 'YISHUN AVE 7',
'SERANGOON NTH AVE 2', 'YISHUN AVE 11', 'BANGKIT RD',
'JURONG WEST ST 73', 'OUTRAM HILL', 'HOUGANG AVE 6',
'PASIR RIS ST 12', 'PENDING RD', 'PETIR RD', 'LOR 3 GEYLANG',
'BISHAN ST 11', 'PASIR RIS DR 6', 'BISHAN ST 23',
'JURONG WEST ST 92', 'PASIR RIS ST 11', 'YISHUN CTRL',
'BISHAN ST 22', 'SIMEI RD', 'TAMPINES ST 84', 'BT PANJANG RING RD',
'JURONG WEST ST 93', 'FAJAR RD', 'WOODLANDS ST 81',
'CHOA CHU KANG CTRL', 'PASIR RIS ST 51', 'HOUGANG ST 52',
'CASHEW RD', 'TOH YI DR', 'HOUGANG CTRL', 'KG KAYU RD',
'TAMPINES AVE 8', 'TAMPINES ST 45', 'SIMEI ST 2',
'WOODLANDS AVE 3', 'LENGKONG TIGA', 'WOODLANDS ST 82',
'SERANGOON NTH AVE 4', 'SERANGOON CTRL DR', 'BRIGHT HILL DR',
'SAUJANA RD', 'CHOA CHU KANG AVE 3', 'TAMPINES AVE 9',
'JURONG WEST ST 51', 'YUNG HO RD', 'SERANGOON AVE 1',
'PASIR RIS ST 41', 'GEYLANG EAST AVE 2', 'CHOA CHU KANG AVE 2',
'KIM KEAT LINK', 'PASIR RIS DR 4', 'PASIR RIS ST 21',
'SENG POH RD', 'HOUGANG ST 51', 'JURONG WEST ST 72',
'JURONG WEST ST 71', 'PASIR RIS ST 52', 'TAMPINES ST 32',
'CHOA CHU KANG AVE 4', 'CHOA CHU KANG LOOP', 'JLN TENAGA',
'TAMPINES CTRL 1', 'TAMPINES ST 33', 'BT BATOK WEST AVE 7',
'JURONG WEST AVE 5', 'TAMPINES AVE 7', 'WOODLANDS ST 83',
'CHOA CHU KANG ST 51', 'PASIR RIS DR 3', 'YISHUN CTRL 1',
'CHOA CHU KANG AVE 1', 'WOODLANDS ST 31', 'BT MERAH LANE 1',
'PASIR RIS ST 13', 'ELIAS RD', 'BISHAN ST 24', 'WHAMPOA RD',
'WOODLANDS ST 41', 'PASIR RIS ST 71', 'JURONG WEST ST 74',
'PASIR RIS DR 1', 'PASIR RIS ST 72', 'PASIR RIS DR 10',
'CHOA CHU KANG ST 52', 'CLARENCE LANE', 'CHOA CHU KANG NTH 6',
'PASIR RIS ST 53', 'CHOA CHU KANG NTH 5', 'ANG MO KIO ST 21',
'JLN DAMAI', 'CHOA CHU KANG ST 62', 'WOODLANDS AVE 5',
'WOODLANDS DR 50', 'CHOA CHU KANG ST 53', 'TAMPINES ST 72',
'UPP SERANGOON RD', 'JURONG WEST ST 75', 'STRATHMORE AVE',
'ANG MO KIO ST 31', 'TAMPINES ST 34', 'YUNG AN RD',
'WOODLANDS AVE 4', 'CHOA CHU KANG NTH 7', 'ANG MO KIO ST 11',
'WOODLANDS AVE 9', 'YUNG LOH RD', 'CHOA CHU KANG DR',
'CHOA CHU KANG ST 54', 'REDHILL LANE', 'KANG CHING RD',
'TAH CHING RD', 'SIMEI ST 5', 'WOODLANDS DR 40', 'WOODLANDS DR 70',
'TAMPINES ST 71', 'WOODLANDS DR 42', 'SERANGOON NTH AVE 3',
'JELAPANG RD', 'BT BATOK ST 22', 'HOUGANG ST 91',
'WOODLANDS AVE 6', 'WOODLANDS CIRCLE', 'CORPORATION DR',
'LOMPANG RD', 'WOODLANDS DR 72', 'CHOA CHU KANG ST 64',
'BT BATOK ST 24', 'JLN TECK WHYE', 'WOODLANDS CRES',
'WOODLANDS DR 60', 'CHANGI VILLAGE RD', 'BT BATOK ST 25',
'HOUGANG AVE 9', 'JURONG WEST CTRL 1', 'WOODLANDS RING RD',
'CHOA CHU KANG AVE 5', 'TOH GUAN RD', 'JURONG WEST ST 61',
'WOODLANDS DR 14', 'HOUGANG ST 92', 'CHOA CHU KANG CRES',
'SEMBAWANG CL', 'CANBERRA RD', 'SEMBAWANG CRES', 'SEMBAWANG VISTA',
'COMPASSVALE WALK', 'RIVERVALE ST', 'WOODLANDS DR 62',
'SEMBAWANG DR', 'WOODLANDS DR 53', 'WOODLANDS DR 52',
'RIVERVALE WALK', 'COMPASSVALE LANE', 'RIVERVALE DR', 'SENJA RD',
'JURONG WEST ST 65', 'RIVERVALE CRES', 'WOODLANDS DR 44',
'COMPASSVALE DR', 'WOODLANDS DR 16', 'COMPASSVALE RD',
'WOODLANDS DR 73', 'HOUGANG ST 31', 'JURONG WEST ST 64',
'WOODLANDS DR 71', 'YISHUN ST 20', 'ADMIRALTY DR',
'COMPASSVALE ST', 'BEDOK RESERVOIR VIEW', 'YUNG SHENG RD',
'ADMIRALTY LINK', 'SENGKANG EAST WAY', 'ANG MO KIO ST 32',
'ANG MO KIO ST 52', 'BOON TIONG RD', 'JURONG WEST ST 62',
'ANCHORVALE LINK', 'CANBERRA LINK', 'COMPASSVALE CRES',
'CLEMENTI ST 12', 'MONTREAL DR', 'WELLINGTON CIRCLE',
'SENGKANG EAST RD', 'JURONG WEST AVE 3', 'ANCHORVALE LANE',
'SENJA LINK', 'EDGEFIELD PLAINS', 'ANCHORVALE DR', 'SEGAR RD',
'FARRER PK RD', 'PUNGGOL FIELD', 'EDGEDALE PLAINS',
'ANCHORVALE RD', 'CANTONMENT CL', 'JLN MEMBINA', 'FERNVALE LANE',
'JURONG WEST ST 25', 'CLEMENTI ST 11', 'PUNGGOL FIELD WALK',
'KLANG LANE', 'PUNGGOL CTRL', 'JELEBU RD', 'BUANGKOK CRES',
'WOODLANDS DR 75', 'BT BATOK WEST AVE 5', 'JELLICOE RD',
'PUNGGOL DR', 'JURONG WEST ST 24', 'SEMBAWANG WAY', 'FERNVALE RD',
'BUANGKOK LINK', 'FERNVALE LINK', 'JLN TIGA', 'YUAN CHING RD',
'COMPASSVALE LINK', 'MARINE PARADE CTRL', 'COMPASSVALE BOW',
'PUNGGOL RD', 'BEDOK CTRL', 'PUNGGOL EAST', 'SENGKANG CTRL',
'TAMPINES CTRL 7', 'SENGKANG WEST AVE', 'PUNGGOL PL',
'CANTONMENT RD', 'GHIM MOH LINK', 'SIMEI LANE', 'YISHUN ST 41',
'TELOK BLANGAH ST 31', 'JLN KAYU', 'LOR 1A TOA PAYOH',
'PUNGGOL WALK', 'SENGKANG WEST WAY', 'BUANGKOK GREEN',
'PUNGGOL WAY', 'YISHUN ST 31', 'TECK WHYE CRES', 'MONTREAL LINK',
'UPP SERANGOON CRES', 'SUMANG LINK', 'SENGKANG EAST AVE',
'YISHUN AVE 1', 'ANCHORVALE CRES', 'ANCHORVALE ST',
'TAMPINES CTRL 8', 'YISHUN ST 51', 'UPP SERANGOON VIEW',
'TAMPINES AVE 1', 'BEDOK RESERVOIR CRES', 'ANG MO KIO ST 61',
'DAWSON RD', 'FERNVALE ST', 'HOUGANG ST 32', 'TAMPINES ST 86',
'SUMANG WALK', 'CHOA CHU KANG AVE 7', 'KEAT HONG CL',
'JURONG WEST CTRL 3', 'KEAT HONG LINK', 'ALJUNIED AVE 2',
'CANBERRA CRES', 'SUMANG LANE', 'CANBERRA ST', 'ANG MO KIO ST 44'],
dtype=object)
df.flat_model.unique()
array(['improved', 'new generation', 'model a', 'standard', 'simplified',
'model a-maisonette', 'apartment', 'maisonette', 'terrace',
'2-room', 'improved-maisonette', 'multi generation',
'premium apartment', 'adjoined flat', 'premium maisonette',
'model a2', 'dbss', 'type s1', 'type s2', 'premium apartment loft'],
dtype=object)
labelencoder = LabelEncoder()
df['flat_model_label'] = labelencoder.fit_transform(df['flat_model'])
df
| month | town | block | street_name | storey_range | floor_area_sqm | flat_model | lease_commence_date | resale_price | remaining_lease | storey_first | storey_last | storey_avg | floor_area_sqft | flat_type_label | flat_model_label | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1990-01 | Ang Mo Kio | 309 | ANG MO KIO AVE 1 | 10 TO 12 | 31.00 | improved | 1977 | 9000.00 | NaN | 10 | 12 | 11.00 | 333.68 | 0 | 4 |
| 1 | 1990-01 | Ang Mo Kio | 309 | ANG MO KIO AVE 1 | 04 TO 06 | 31.00 | improved | 1977 | 6000.00 | NaN | 4 | 6 | 5.00 | 333.68 | 0 | 4 |
| 2 | 1990-01 | Ang Mo Kio | 309 | ANG MO KIO AVE 1 | 10 TO 12 | 31.00 | improved | 1977 | 8000.00 | NaN | 10 | 12 | 11.00 | 333.68 | 0 | 4 |
| 3 | 1990-01 | Ang Mo Kio | 309 | ANG MO KIO AVE 1 | 07 TO 09 | 31.00 | improved | 1977 | 6000.00 | NaN | 7 | 9 | 8.00 | 333.68 | 0 | 4 |
| 4 | 1990-01 | Ang Mo Kio | 216 | ANG MO KIO AVE 1 | 04 TO 06 | 73.00 | new generation | 1976 | 47200.00 | NaN | 4 | 6 | 5.00 | 785.76 | 2 | 11 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 867886 | 2022-03 | Yishun | 501B | YISHUN ST 51 | 07 TO 09 | 112.00 | improved | 2018 | 652000.00 | 94 years 11 months | 7 | 9 | 8.00 | 1205.56 | 4 | 4 |
| 867887 | 2022-03 | Yishun | 820 | YISHUN ST 81 | 07 TO 09 | 122.00 | improved | 1988 | 585000.00 | 65 years 06 months | 7 | 9 | 8.00 | 1313.20 | 4 | 4 |
| 867888 | 2022-03 | Yishun | 359 | YISHUN RING RD | 01 TO 03 | 145.00 | apartment | 1988 | 682000.00 | 65 years 05 months | 1 | 3 | 2.00 | 1560.77 | 5 | 2 |
| 867889 | 2022-03 | Yishun | 387 | YISHUN RING RD | 04 TO 06 | 146.00 | maisonette | 1988 | 720000.00 | 65 years 04 months | 4 | 6 | 5.00 | 1571.53 | 5 | 6 |
| 867890 | 2022-03 | Yishun | 277 | YISHUN ST 22 | 10 TO 12 | 146.00 | maisonette | 1985 | 788000.00 | 62 years 06 months | 10 | 12 | 11.00 | 1571.53 | 5 | 6 |
867891 rows × 16 columns
df.drop('flat_model', axis=1, inplace=True)
df.dtypes
month object town object block object street_name object storey_range object floor_area_sqm float64 lease_commence_date int64 resale_price float64 remaining_lease object storey_first int64 storey_last int64 storey_avg float64 floor_area_sqft float64 flat_type_label int32 flat_model_label int32 dtype: object
labelencoder = LabelEncoder()
df['town'] = labelencoder.fit_transform(df['town'])
df.town.unique()
array([ 0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 19, 21,
22, 23, 24, 25, 26, 15, 20, 5, 17, 18])
df.drop('town', axis=1, inplace=True)
df.dtypes
month object block object street_name object storey_range object floor_area_sqm float64 lease_commence_date int64 resale_price float64 remaining_lease object storey_first int64 storey_last int64 storey_avg float64 floor_area_sqft float64 flat_type_label int32 flat_model_label int32 dtype: object
df
| month | block | street_name | storey_range | floor_area_sqm | lease_commence_date | resale_price | remaining_lease | storey_first | storey_last | storey_avg | floor_area_sqft | flat_type_label | flat_model_label | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1990-01 | 309 | ANG MO KIO AVE 1 | 10 TO 12 | 31.00 | 1977 | 9000.00 | NaN | 10 | 12 | 11.00 | 333.68 | 0 | 4 |
| 1 | 1990-01 | 309 | ANG MO KIO AVE 1 | 04 TO 06 | 31.00 | 1977 | 6000.00 | NaN | 4 | 6 | 5.00 | 333.68 | 0 | 4 |
| 2 | 1990-01 | 309 | ANG MO KIO AVE 1 | 10 TO 12 | 31.00 | 1977 | 8000.00 | NaN | 10 | 12 | 11.00 | 333.68 | 0 | 4 |
| 3 | 1990-01 | 309 | ANG MO KIO AVE 1 | 07 TO 09 | 31.00 | 1977 | 6000.00 | NaN | 7 | 9 | 8.00 | 333.68 | 0 | 4 |
| 4 | 1990-01 | 216 | ANG MO KIO AVE 1 | 04 TO 06 | 73.00 | 1976 | 47200.00 | NaN | 4 | 6 | 5.00 | 785.76 | 2 | 11 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 867886 | 2022-03 | 501B | YISHUN ST 51 | 07 TO 09 | 112.00 | 2018 | 652000.00 | 94 years 11 months | 7 | 9 | 8.00 | 1205.56 | 4 | 4 |
| 867887 | 2022-03 | 820 | YISHUN ST 81 | 07 TO 09 | 122.00 | 1988 | 585000.00 | 65 years 06 months | 7 | 9 | 8.00 | 1313.20 | 4 | 4 |
| 867888 | 2022-03 | 359 | YISHUN RING RD | 01 TO 03 | 145.00 | 1988 | 682000.00 | 65 years 05 months | 1 | 3 | 2.00 | 1560.77 | 5 | 2 |
| 867889 | 2022-03 | 387 | YISHUN RING RD | 04 TO 06 | 146.00 | 1988 | 720000.00 | 65 years 04 months | 4 | 6 | 5.00 | 1571.53 | 5 | 6 |
| 867890 | 2022-03 | 277 | YISHUN ST 22 | 10 TO 12 | 146.00 | 1985 | 788000.00 | 62 years 06 months | 10 | 12 | 11.00 | 1571.53 | 5 | 6 |
867891 rows × 14 columns
df.lease_commence_date.unique() # this is a YEAR, but should you label encode it, OR keep it as an year integer ???
array([1977, 1976, 1978, 1979, 1984, 1980, 1985, 1981, 1982, 1986, 1972,
1983, 1973, 1969, 1975, 1971, 1974, 1967, 1970, 1968, 1988, 1987,
1989, 1990, 1992, 1993, 1994, 1991, 1995, 1996, 1997, 1998, 1999,
2000, 2001, 1966, 2002, 2006, 2003, 2005, 2004, 2008, 2007, 2009,
2010, 2012, 2011, 2013, 2014, 2015, 2016, 2017, 2018, 2019],
dtype=int64)
df.dtypes
month object block object street_name object storey_range object floor_area_sqm float64 lease_commence_date int64 resale_price float64 remaining_lease object storey_first int64 storey_last int64 storey_avg float64 floor_area_sqft float64 flat_type_label int32 flat_model_label int32 dtype: object
# fix or remove ?
# df['flat_type_labels'] = labelencoder.fit_transform(df['flat_type'])
len(df)
867891
df_backup = df.copy()
c = df.corr()
c
| floor_area_sqm | lease_commence_date | resale_price | storey_first | storey_last | storey_avg | floor_area_sqft | flat_type_label | flat_model_label | |
|---|---|---|---|---|---|---|---|---|---|
| floor_area_sqm | 1.00 | 0.43 | 0.60 | 0.05 | 0.05 | 0.05 | 1.00 | 0.95 | -0.41 |
| lease_commence_date | 0.43 | 1.00 | 0.52 | 0.15 | 0.15 | 0.15 | 0.43 | 0.45 | -0.21 |
| resale_price | 0.60 | 0.52 | 1.00 | 0.23 | 0.23 | 0.23 | 0.60 | 0.63 | -0.27 |
| storey_first | 0.05 | 0.15 | 0.23 | 1.00 | 1.00 | 1.00 | 0.05 | 0.08 | -0.08 |
| storey_last | 0.05 | 0.15 | 0.23 | 1.00 | 1.00 | 1.00 | 0.05 | 0.08 | -0.08 |
| storey_avg | 0.05 | 0.15 | 0.23 | 1.00 | 1.00 | 1.00 | 0.05 | 0.08 | -0.08 |
| floor_area_sqft | 1.00 | 0.43 | 0.60 | 0.05 | 0.05 | 0.05 | 1.00 | 0.95 | -0.41 |
| flat_type_label | 0.95 | 0.45 | 0.63 | 0.08 | 0.08 | 0.08 | 0.95 | 1.00 | -0.37 |
| flat_model_label | -0.41 | -0.21 | -0.27 | -0.08 | -0.08 | -0.08 | -0.41 | -0.37 | 1.00 |
sns.heatmap(c, annot=True)
plt.show();
# cleaner view
plt.figure(figsize=(16, 6))
mask = np.triu(np.ones_like(df.corr(), dtype=bool))
heatmap = sns.heatmap(df.corr(), mask=mask, vmin=-1, vmax=1, annot=True, cmap='BrBG')
heatmap.set_title('Triangle Correlation Heatmap', fontdict={'fontsize':18}, pad=16);
sns.set(style='whitegrid')
plt.figure(figsize=(15,10))
sns.scatterplot(x="floor_area_sqm", y="resale_price", hue = "flat_type_label", palette='colorblind', data=df);
plt.figure(figsize=(15,10))
sns.histplot(data=df, x="resale_price", color="green", kde=True, bins=100)
plt.show();
plt.figure(figsize=(15,10))
sns.scatterplot(x="floor_area_sqm", y="resale_price",data=df);
p = figure(width=670, height=400, toolbar_location=None,
title="Normal (Gaussian) Distribution")
n = 1000
rng = np.random.default_rng(825914)
x = rng.normal(loc=4.7, scale=14.3, size=n)
# Scale random data so that it has mean of 0 and standard deviation of 1
xbar = x.mean()
sigma = x.std()
scaled = (x - xbar) / sigma
# Histogram
bins = np.linspace(-3, 3, 40)
hist, edges = np.histogram(scaled, density=True, bins=bins)
p.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
fill_color="skyblue", line_color="white",
legend_label=f"{n} random samples")
# Probability density function
x = np.linspace(-3.0, 3.0, 100)
pdf = np.exp(-0.5*x**2) / np.sqrt(2.0*np.pi)
p.line(x, pdf, line_width=2, line_color="navy",
legend_label="Probability Density Function")
p.y_range.start = 0
p.xaxis.axis_label = "x"
p.yaxis.axis_label = "PDF(x)"
p.xaxis.ticker = [-3, -2, -1, 0, 1, 2, 3]
p.xaxis.major_label_overrides = {
-3: TeX(r"\overline{x} - 3\sigma"),
-2: TeX(r"\overline{x} - 2\sigma"),
-1: TeX(r"\overline{x} - \sigma"),
0: TeX(r"\overline{x}"),
1: TeX(r"\overline{x} + \sigma"),
2: TeX(r"\overline{x} + 2\sigma"),
3: TeX(r"\overline{x} + 3\sigma"),}
p.yaxis.ticker = [0, 0.1, 0.2, 0.3, 0.4]
p.yaxis.major_label_overrides = {
0: TeX(r"0"),
0.1: TeX(r"0.1/\sigma"),
0.2: TeX(r"0.2/\sigma"),
0.3: TeX(r"0.3/\sigma"),
0.4: TeX(r"0.4/\sigma"),}
div = Div(text=r"""
A histogram of a samples from a Normal (Gaussian) distribution, together with
the ideal probability density function, given by the equation:
<p />
$$
\qquad PDF(x) = \frac{1}{\sigma\sqrt{2\pi}} \exp\left[-\frac{1}{2}
\left(\frac{x-\overline{x}}{\sigma}\right)^2 \right]
$$
""")
show(column(p, div))
# Create the blank plot
# p = figure(plot_height = 600, plot_width = 600,
# title = 'Histogram of Arrival Delays',
# x_axis_label = 'Delay (min)]',
# y_axis_label = 'Number of Flights')
# # Add a quad glyph
# p.quad(bottom=0, top=delays['flights'],
# left=delays['left'], right=delays['right'],
# fill_color='red', line_color='black')
# # Show the plot
# show(p)
corrMatrix = df.corr()
sns.heatmap(corrMatrix, annot=True)
plt.show()
<AxesSubplot:>
df.corr()
| floor_area_sqm | lease_commence_date | resale_price | storey_first | storey_last | storey_avg | floor_area_sqft | flat_type_label | flat_model_label | |
|---|---|---|---|---|---|---|---|---|---|
| floor_area_sqm | 1.00 | 0.43 | 0.60 | 0.05 | 0.05 | 0.05 | 1.00 | 0.95 | -0.41 |
| lease_commence_date | 0.43 | 1.00 | 0.52 | 0.15 | 0.15 | 0.15 | 0.43 | 0.45 | -0.21 |
| resale_price | 0.60 | 0.52 | 1.00 | 0.23 | 0.23 | 0.23 | 0.60 | 0.63 | -0.27 |
| storey_first | 0.05 | 0.15 | 0.23 | 1.00 | 1.00 | 1.00 | 0.05 | 0.08 | -0.08 |
| storey_last | 0.05 | 0.15 | 0.23 | 1.00 | 1.00 | 1.00 | 0.05 | 0.08 | -0.08 |
| storey_avg | 0.05 | 0.15 | 0.23 | 1.00 | 1.00 | 1.00 | 0.05 | 0.08 | -0.08 |
| floor_area_sqft | 1.00 | 0.43 | 0.60 | 0.05 | 0.05 | 0.05 | 1.00 | 0.95 | -0.41 |
| flat_type_label | 0.95 | 0.45 | 0.63 | 0.08 | 0.08 | 0.08 | 0.95 | 1.00 | -0.37 |
| flat_model_label | -0.41 | -0.21 | -0.27 | -0.08 | -0.08 | -0.08 | -0.41 | -0.37 | 1.00 |
# Metadata for Resale Flat Prices
---
Identifier: 7a339d20-3c57-4b11-a695-9348adfd7614
Name: resale-flat-prices
Title: Resale Flat Prices
Description:
- Resale transacted prices.
- Prior to March 2012, data is based on date of approval for the resale transactions.
- For March 2012 onwards, the data is based on date of registration for the resale
transactions.
Topics:
- Infrastructure
Keywords:
- Cost of Living
- HDB
- Housing
- Property
- Public Housing
- Resale Flats
Publisher:
Name: Housing and Development Board
Admin 1:
Name: Lau Pei Wen
Department: HDB
Email: LAU_Pei_Wen@hdb.gov.sg
Admin 2:
Name: Toh Xue Qin
Department: HDB
Email: toh_xue_qin@hdb.gov.sg
Sources:
- Housing and Development Board
License: https://data.gov.sg/open-data-licence
Frequency: Monthly
Coverage: 1990-01-01 to 2022-03-10
Last Updated: 2022-03-11T01:00:32.750114
Resources:
-
Identifier: f1765b54-a209-4718-8d38-a39237f502b3
Title: Resale flat prices based on registration date from Jan-2017 onwards
Url: https://storage.data.gov.sg/resale-flat-prices/resources/resale-flat-prices-based-on-registration-date-from-jan-2017-onwards-2022-03-11T01-00-31Z.csv
Format: CSV
Coverage: 2017-01-01 to 2022-03-10
Last Updated: 2022-03-11T01:00:31.456350
Schema:
-
Name: month
Title: Month
Type: datetime
Sub Type: month
Format: YYYY-MM
-
Name: town
Title: Town
Type: text
Sub Type: general
-
Name: flat_type
Title: Flat type
Type: text
Sub Type: general
-
Name: block
Title: Block
Type: text
Sub Type: general
-
Name: street_name
Title: Street name
Type: text
Sub Type: general
-
Name: storey_range
Title: Storey range
Type: text
Sub Type: general
-
Name: floor_area_sqm
Title: Floor area sqm
Type: numeric
Sub Type: general
Unit Of Measure: sqm
-
Name: flat_model
Title: Flat model
Type: text
Sub Type: general
-
Name: lease_commence_date
Title: Lease commence date
Type: datetime
Sub Type: year
Format: YYYY
-
Name: remaining_lease
Title: Remaining lease
Type: text
Sub Type: general
-
Name: resale_price
Title: Resale price
Type: numeric
Sub Type: general
Unit Of Measure: $
-
Identifier: 1b702208-44bf-4829-b620-4615ee19b57c
Title: Resale Flat Prices (Based on Registration Date), From Jan 2015 to Dec
2016
Url: https://storage.data.gov.sg/resale-flat-prices/resources/resale-flat-prices-based-on-registration-date-from-jan-2015-to-dec-2016-2019-06-17T09-03-16Z.csv
Format: CSV
Coverage: 2015-01-01 to 2016-12-31
Description:
- Notes:
-
- 1. The approximate floor area includes any recess area purchased, space adding
item under HDB’s upgrading programmes, roof terrace, etc.
-
- 2. The transactions exclude resale transactions that may not reflect the
full market price such as resale between relatives and resale of part shares.
-
- 3. Resale prices should be taken as indicative only as the resale prices
agreed between buyers and sellers are dependent on many factors.
-
- 4. "Remaining lease" is the number of years left before the lease ends. This
information is computed as at the resale flat application.
Last Updated: 2019-06-17T09:03:16.240486
Schema:
-
Name: month
Title: Month
Type: datetime
Sub Type: month
Format: YYYY-MM
-
Name: town
Title: Town
Type: text
Sub Type: general
-
Name: flat_type
Title: Flat Type
Type: text
Sub Type: general
-
Name: block
Title: Block
Type: text
Sub Type: general
-
Name: street_name
Title: Street Name
Type: text
Sub Type: general
-
Name: storey_range
Title: Storey Range
Type: text
Sub Type: general
-
Name: floor_area_sqm
Title: Floor Area
Type: numeric
Sub Type: general
Unit Of Measure: Sqm
-
Name: flat_model
Title: Flat Model
Type: text
Sub Type: general
-
Name: lease_commence_date
Title: Lease Commencement Date
Type: datetime
Sub Type: year
Format: YYYY
-
Name: remaining_lease
Title: Remaining Lease
Type: numeric
Sub Type: general
Unit Of Measure: Years
-
Name: resale_price
Title: Resale Price
Type: numeric
Sub Type: general
Unit Of Measure: $
-
Identifier: 83b2fc37-ce8c-4df4-968b-370fd818138b
Title: Resale Flat Prices (Based on Registration Date), From Mar 2012 to Dec
2014
Url: https://storage.data.gov.sg/resale-flat-prices/resources/resale-flat-prices-based-on-registration-date-from-mar-2012-to-dec-2014-2019-06-17T09-04-34Z.csv
Format: CSV
Coverage: 2012-03-01 to 2014-12-31
Description:
- Notes:
-
- 1. The approximate floor area includes any recess area purchased, space adding
item under HDB’s upgrading programmes, roof terrace, etc.
-
- 2. The transactions exclude resale transactions that may not reflect the
full market price such as resale between relatives and resale of part shares.
-
- 3. Resale prices should be taken as indicative only as the resale prices
agreed between buyers and sellers are dependent on many factors.
Last Updated: 2019-06-17T09:04:34.538580
Schema:
-
Name: month
Title: Month
Type: datetime
Sub Type: month
Format: YYYY-MM
-
Name: town
Title: Town
Type: text
Sub Type: general
-
Name: flat_type
Title: Flat Type
Type: text
Sub Type: general
-
Name: block
Title: Block
Type: text
Sub Type: general
-
Name: street_name
Title: Street Name
Type: text
Sub Type: general
-
Name: storey_range
Title: Storey Range
Type: text
Sub Type: general
-
Name: floor_area_sqm
Title: Floor Area
Type: numeric
Sub Type: general
Unit Of Measure: Sqm
-
Name: flat_model
Title: Flat Model
Type: text
Sub Type: general
-
Name: lease_commence_date
Title: Lease Commence Date
Type: datetime
Sub Type: year
Format: YYYY
-
Name: resale_price
Title: Resale Price
Type: numeric
Sub Type: general
Unit Of Measure: S$
-
Identifier: 8c00bf08-9124-479e-aeca-7cc411d884c4
Title: Resale Flat Prices (Based on Approval Date), 2000 - Feb 2012
Url: https://storage.data.gov.sg/resale-flat-prices/resources/resale-flat-prices-based-on-approval-date-2000-feb-2012-2019-06-28T10-14-13Z.csv
Format: CSV
Coverage: 2000-01-01 to 2012-02-29
Description:
- Notes:
-
- 1. The approximate floor area includes any recess area purchased, space adding
item under HDB’s upgrading programmes, roof terrace, etc.
-
- 2. The transactions exclude resale transactions that may not reflect the
full market price such as resale between relatives and resale of part shares.
-
- 3. Resale prices should be taken as indicative only as the resale prices
agreed between buyers and sellers are dependent on many factors.
Last Updated: 2019-06-28T10:14:13.202606
Schema:
-
Name: month
Title: Month
Type: datetime
Sub Type: month
Format: YYYY-MM
-
Name: town
Title: Town
Type: text
Sub Type: general
-
Name: flat_type
Title: Flat Type
Type: text
Sub Type: general
-
Name: block
Title: Block
Type: text
Sub Type: general
-
Name: street_name
Title: Street Name
Type: text
Sub Type: general
-
Name: storey_range
Title: Storey Range
Type: text
Sub Type: general
-
Name: floor_area_sqm
Title: Floor Area
Type: numeric
Sub Type: general
Unit Of Measure: Sqm
-
Name: flat_model
Title: Flat Model
Type: text
Sub Type: general
-
Name: lease_commence_date
Title: Lease Commence Date
Type: datetime
Sub Type: year
Format: YYYY
-
Name: resale_price
Title: Resale Price
Type: numeric
Sub Type: general
Unit Of Measure: S$
-
Identifier: adbbddd3-30e2-445f-a123-29bee150a6fe
Title: Resale Flat Prices (Based on Approval Date), 1990 - 1999
Url: https://storage.data.gov.sg/resale-flat-prices/resources/resale-flat-prices-based-on-approval-date-1990-1999-2021-05-25T02-49-29Z.csv
Format: CSV
Coverage: 1990-01-01 to 1999-12-31
Description:
- Notes:
-
- 1. The approximate floor area includes any recess area purchased, space adding
item under HDB’s upgrading programmes, roof terrace, etc.
-
- 2. The transactions exclude resale transactions that may not reflect the
full market price such as resale between relatives and resale of part shares.
-
- 3. Resale prices should be taken as indicative only as the resale prices
agreed between buyers and sellers are dependent on many factors.
Last Updated: 2021-05-25T02:49:29.941885
Schema:
-
Name: month
Title: Month
Type: datetime
Sub Type: month
Format: YYYY-MM
-
Name: town
Title: Town
Type: text
Sub Type: general
-
Name: flat_type
Title: Flat Type
Type: text
Sub Type: general
-
Name: block
Title: Block
Type: text
Sub Type: general
-
Name: street_name
Title: Street Name
Type: text
Sub Type: general
-
Name: storey_range
Title: Storey Range
Type: text
Sub Type: general
-
Name: floor_area_sqm
Title: Floor Area
Type: numeric
Sub Type: general
Unit Of Measure: Sqm
-
Name: flat_model
Title: Flat Model
Type: text
Sub Type: general
-
Name: lease_commence_date
Title: Lease Commence Date
Type: datetime
Sub Type: year
Format: YYYY
-
Name: resale_price
Title: Resale Price
Type: numeric
Sub Type: general
Unit Of Measure: S$
# print('Named explicitly:')
# for name in glob.glob('/home/geeks/Desktop/gfg/data.txt'):
# print(name)
# # Using '*' pattern
# print('\nNamed with wildcard *:')
# for name in glob.glob('/home/geeks/Desktop/gfg/*'):
# print(name)
# # Using '?' pattern
# print('\nNamed with wildcard ?:')
# for name in glob.glob('/home/geeks/Desktop/gfg/data?.txt'):
# print(name)
# # Using [0-9] pattern
# print('\nNamed with wildcard ranges:')
# for name in glob.glob('/home/geeks/Desktop/gfg/*[0-9].*'):
# print(name)
# # Python program to find files
# # recursively using Python
# # Returns a list of names in list files.
# print("Using glob.glob()")
# files = glob.glob('/home/geeks/Desktop/gfg/**/*.txt',
# recursive = True)
# for file in files:
# print(file)
# # It returns an iterator which will
# # be printed simultaneously.
# print("\nUsing glob.iglob()")
# for filename in glob.iglob('/home/geeks/Desktop/gfg/**/*.txt',
# recursive = True):
# print(filename)